ExpOmics: a comprehensive web platform empowering biologists with robust multi-omics data analysis capabilities

Abstract Motivation High-throughput technologies yield a broad spectrum of multi-omics datasets, which offer unparalleled insights into complex biological systems. However, effectively analyzing this diverse array of data presents challenges, considering factors such as species diversity, data types, costs, and limitations of the available tools. Results Herein, we present ExpOmics, a comprehensive web platform featuring 7 applications and 4 toolkits, with 28 customizable analysis functions spanning various analyses of differential expression, co-expression, Weighted Gene Co-expression Network Analysis (WGCNA), feature selection, and functional enrichment. ExpOmics allows users to upload and explore multi-omics data without organism restrictions, supporting various expression data, including genes, mRNAs, lncRNAs, miRNAs, circRNAs, piRNAs, and proteins and is compatible with diverse gene nomenclatures and expression values. Moreover, ExpOmics enables users to analyze 22 427 transcriptomic datasets of 196 cancer subtypes sourced from 63 projects of The Cancer Genome Atlas Program (TCGA) to identify cancer biomarkers. The analysis results from ExpOmics are presented in high-quality graphical formats suitable for publication and are available for free download. A case study using ExpOmics identified two potential oncogenes, SERPINE1 and SLC43A1, that may regulate colorectal cancer through distinct biological processes. In summary, ExpOmics can serves as a robust platform for global researchers to explore multi-omics data, gain biological insights, and formulate testable hypotheses. Availability and implementation ExpOmics is available at http://www.biomedical-web.com/expomics.


Introduction
Multi-omics data generated by high-throughput technologies, such as next-generation sequencing and mass spectrometry, are invaluable resources for understanding complex biological systems (Wang et al. 2009, Stark et al. 2019).However, despite their potential, mining and analyzing such data remain challenging because of their high costs and thresholds.For instance, the complexity and diversity of multi-omics datasets present a considerable challenge for biologists lacking bioinformatics expertise in exploring these data, including gene, messenger RNA (mRNA), long non-coding RNA (lncRNA), microRNA (miRNA), circular RNA (circRNA), piwi-interacting RNA (piRNA), and protein expression data (Mougin et al. 2018).Furthermore, customizable data analysis is often required to address specific biological questions, emphasizing the necessity for user-friendly web platforms to enable the efficient investigation of these complex and diverse data and advance life science research (O'Donoghue et al. 2010, Mougin et al. 2018).To address these challenges, bioinformaticians have made efforts to develop several web platforms to explore and analyze multi-omics data (Chang et al. 2018, Ge et al. 2018, Cheng et al. 2021, Conard et al. 2021, Zhou et al. 2021, Liu et al. 2022).
However, critical issues remain that need to be urgently addressed.For instance, ExpressVis (Liu et al. 2022), eVITTA (Cheng et al. 2021), PANDA-view (Chang et al. 2018), iDEP (Ge et al. 2018), and TIMEOR (Conard et al. 2021) primarily focus on gene and protein expression data, catering to specific species, such as Homo sapiens, and model organisms.Consequently, they lack the ability to handle non-coding gene expression data and data from other organisms, including lncRNAs, circRNAs, miRNAs, and piRNAs.Moreover, they lack essential analytical capabilities, such as Weighted Gene Co-expression Network Analysis (WGCNA), feature selection, and integration with large-scale international omics projects, such as The Cancer Genome Atlas (TCGA) (Weinstein et al. 2013).Furthermore, they provide relatively limited functionality for differential expression and functional enrichment analyses, and cannot meet the demands of complex analyses.
To fill these gaps, we propose ExpOmics, an easy-to-use web platform featuring applications designed to assist biologists in efficiently processing various types of expression data, without requiring programming skills.ExpOmics offers robust multi-omics data analysis and visualization capabilities for exploring gene, mRNA, lncRNA, miRNA, circRNA, piRNA, and protein expression data, covering various aspects of differential expression, co-expression, WGCNA, feature selection, and functional enrichment analysis.ExpOmics allows users to upload expression data with diverse gene nomenclatures from different resources, and supports various types of expression values and organisms.In addition, ExpOmics integrates 22 427 transcriptomic datasets of 196 cancer subtypes from the TCGA.This integration empowers biologists to thoroughly explore data, thereby facilitating the discovery and validation of cancer biomarkers and targets.In summary, ExpOmics serves as a valuable web platform for equipping biologists worldwide with robust multi-omics data analysis capabilities.

Implementation of ExpOmics
The ExpOmics web platform was developed using a frontend and back-end separation framework, as previously described (Zhang et al. 2022a, Zhang et al. 2022b).Within ExpOmics, 7 distinct applications (GeneExplyzer, Transcriptlyzer, miRExplyzer, circExplyzer, piRExplyzer, ProteinExplyzer, and TCGAExplyzer) with four toolkits (DiffExpToolkit, CorrExpToolkit, WGCNAToolkit, and FeatureSelectToolkit) were developed using R (version 4.3.2) (Fig. 1 and Table 1) and seamlessly integrated to provide a comprehensive suite of functionalities (Table 2).The implementation of these applications and toolkits relies on a carefully selected set of R packages and resources, which have been meticulously detailed in Table 1, Supplementary Tables S1 and S2.

Implementation of GeneExplyzer and Transcriptlyzer
For GeneExplyzer and Transcriptlyzer, gene annotations spanning 13 organisms were curated from the National Center for Biotechnology Information (NCBI) GenBank (Sayers et al. 2024).These annotations included essential details, such as gene symbols, Entrez gene identifiers (IDs), Ensembl gene IDs, transcript symbols, RefSeq accessions, Ensembl transcript IDs, and gene and transcript lengths.By leveraging these rich annotations, we developed GeneExplyzer and Transcriptlyzer applications using R (Fig. 1, Table 1, and Supplementary Table S1).

Implementation of miRExplyzer
To implement miRExplyzer, miRNA annotations spanning 31 organisms were extracted from the miRBase database (Kozomara et al. 2019).These annotations included the primary miRNA ID, mature miRNA ID, primary miRNA symbol, mature miRNA symbol, and other relevant details.Subsequently, by leveraging the extracted miRNA annotations, we developed miRExplyzer using R (Fig. 1, Table 1, and Supplementary Table S1).

Implementation of piRExplyzer
To implement the piRExplyzer, piRNA annotations spanning 43 organisms were extracted using piRBase (Wang et al. 2022).These annotations included piRNA ID, piRNA gene ID, and symbols in GenBank, along with other relevant details.By leveraging the extracted piRNA annotations, we developed piRExplyzer using R (Fig. 1, Table 1, and Supplementary Table S1).

Implementation of ProteinExplyzer
For ProteinExplyzer, the protein annotations of 16 organisms were extracted from UniProt Consortium (2023).These annotations included protein ID and symbols in databases such as UniProt, Ensembl, and RefSeq, along with their host genes and other relevant details.By leveraging this information, the ProteinExplyzer was developed using R (Fig. 1, Table 1, and Supplementary Table S1).

Implementation of TCGAExplyzer
To implement TCGAExplyzer, we extracted and curated gene expression data and sample metadata from TCGA, encompassing 63 projects and 22 427 transcriptomic datasets across 196 cancer subtypes via the Genomic Data Commons (GDC) portal (Weinstein et al. 2013) (Table 1).In addition, 4 toolkits featuring 28 analysis functions were developed and incorporated into TCGAExplyzer using R (Fig. 1, Table 1, and Supplementary Table S1).

Implementation of DiffExpToolkit, CorrExpToolkit, WGCNAToolkit, and FeatureSelectToolkit
The 4 toolkits were designed to provide 28 analysis functions for various aspects of differential expression, co-expression, WGCNA, and feature selection analysis, including DiffExpToolkit, CorrExpToolkit, WGCNAToolkit, and FeatureSelectToolkit (Fig. 1 and Table 2).These functions were designed to enable users to select a project, assign samples to predefined groups, and set parameters for custom analysis.A list of the R packages and resources used to implement these toolkits and their analytical functions is provided in Supplementary Table S2.
Additionally, users can customize organism and gene nomenclature parameters to "other" for scenarios not covered by default options.Upon completing standardization and data upload, the standardized expression data are stored and assigned a unique identifier for tracking and subsequent analyses.ExpOmics provides a feature called data removal, which allows users to delete their uploaded data using an assigned unique identifier, ensuring data security.Each application was equipped with 4 toolkits with 28 analytical functions for the comprehensive exploration of gene, mRNA, lncRNA, miRNA, circRNA, piRNA, and protein expression data across various organisms (Fig. 2b and Tables 1 and 2).These toolkits include the DiffExpToolkit, CorrExpToolkit, WGCNAToolkit, and FeatureSelectToolkit.Moreover, all these analysis functions enable users to select a project, assign samples to predefined groups, and set parameters for custom analysis based on the sample metadata of the dataset.
In contrast to the GeneExplyzer, Transcriptlyzer, miRExplyzer, circExplyzer, piRExplyzer, and ProteinExplyzer applications, TCGAExplyzer focuses on the comprehensive and custom analysis of 22 427 transcriptomic datasets from 63 projects and 196 cancer subtypes in TCGA (Fig. 2c).Furthermore, the results of these analysis functions are visually presented in high-quality graphical formats suitable for publication and available for free download.Furthermore, the results tables are equipped with filtering and sorting capabilities, allowing users to explore the data easily.
Detailed user guidance and video tutorials are available on the "Help" page and individual application webpages.Extensive tests were conducted across popular web browsers (Google Chrome, Safari, Microsoft Edge, and Firefox) to ExpOmics for multi-omics data analysis Table 1.Details of the implementation and utilization of the 7 applications in ExpOmics.

Organisms of the expression data source
GeneExplyzer ensure optimal performance.The details of these applications and toolkits are described below.

GeneExplyzer and Transcriptlyzer
The GeneExplyzer and Transcriptlyzer remove duplicate genes or mRNA/lncRNAs and converts them into standardized gene or transcript symbols in user-uploaded expression data applicable to the listed 13 organisms and others.Moreover, it adeptly normalizes different expression values, including read counts, Reads Per Kilobase per million mapped reads (RPKM), and Fragments Per Kilobase of exon model per Million mapped fragments (FPKM), into Transcripts Per Million (TPM) and seamlessly integrates a log2 transformation process for enhanced analysis.

miRExplyzer
The miRExplyzer aims to remove duplicate miRNA IDs and converts them into standardized miRNA symbols in useruploaded expression data applicable to the listed 31 organisms and others.Moreover, miRExplyzer normalizes different expression values, such as read counts to Reads Per Million (RPM) with log2 transformation, and adjusts RPKM and FPKM to TPM with log2 transformation.

circExplyzer
The circExplyzer aims to remove duplicate circRNAs and converts them into their standardized circRNA IDs in useruploaded expression data applicable to the listed 10 organisms and others.Additionally, circExplyzer normalizes different expression values, such as normalizing junction read counts to Spliced Reads Per Billion Mapping (SRPBM) with log2 transformation and adjusting RPKM and FPKM to TPM with log2 transformation.

piRExplyzer
The piRExplyzer aims to remove duplicate piRNAs and converts them into standardized piRNA IDs in user-uploaded expression data applicable to the listed 43 organisms and others.Additionally, piRExplyzer normalizes diverse expression values, such as read counts to RPM with log2 transformation, and RPKM and FPKM to TPM with log2 transformation.

ProteinExplyzer
The ProteinExplyzer aims to remove duplicate proteins and converts them into standardized protein IDs combined with their host gene symbols in user-uploaded expression data for easier recognition, applicable to the listed 16 organisms and others.Additionally, ProteinExplyzer applies a log2 transformation to the uploaded expression data if the transformation was not applied initially.
Moreover, TCGAExplyzer provides 4 toolkits with 28 custom analysis functions for the comprehensive analysis of these data to expedite the discovery of cancer biomarkers and targets.

WGCNAToolkit
The WGCNAToolkit enables users to conduct WGCNA analysis, including network construction, module detection, gene selection, topological property calculations, data simulation, and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis on the expression data of genes, mRNAs, lncRNAs, miRNAs, circRNAs, piRNAs, and proteins from different organisms.

FeatureSelectToolkit
The FeatureSelectToolkit provides 4 analysis functions for conducting various aspects of feature selection analysis on the expression data of genes, mRNAs, lncRNAs, miRNAs, circRNAs, piRNAs, and proteins from different organisms to identify and discover important gene features.These functions include COX regression, LASSO, ROCCurve, and Survival.ExpOmics for multi-omics data analysis

Case study
A case study was conducted to demonstrate how this web server benefits biologists.Consequently, a dataset (accession no.GSE211496) from the NCBI Gene Expression Omnibus (GEO) was downloaded and uploaded to ExpOmics for further analyses.This dataset provides the microarray expression profiles of mRNAs, lncRNAs, and circRNAs of human colorectal cancer (CRC).Differential analysis of the GSE211496 and TCGA-COAD datasets identified 15 highly variable genes and their corresponding circRNAs that were upregulated in CRC tumor tissues compared to adjacent tissues (log2 fold-change >1, P <.05) (Fig. 3a and b, Supplementary Fig. S1A).Most of the 15 genes showed significant mutual correlations in the correlation analysis of TCGA-COAD dataset (Fig. 3c and Supplementary Fig. S1B).Moreover, survival analysis indicated that only SERPINE1 (PAI-1) and SLC43A1 were associated with poor prognosis in patients with CRC (Fig. 3d and Supplementary Fig. S2).Additionally, Receiver Operating Characteristic (ROC) analysis indicated that these genes could independently distinguish between normal adjacent tissues and tumor tissues (Fig. 3e).However, their expression did not correlate in paired or unpaired statistics, suggesting that they may influence CRC development through different molecular pathways (Fig. 3f).Subsequent WGCNA identified multiple co-expression gene modules (Fig. 3g) related to CRC, including MEred, MEyellow, MElightcyan, MEgrey60, MEgreenyellow, and MEgrey.
Further analysis revealed that SERPINE1 and SLC43A1 were located in different modules: SERPINE1 in the MEgreen-yellow module and SLC43A1 in the MEgrey module.Functional enrichment analysis indicated that genes in the MEgrey module, where SERPINE1 is located, were mainly associated with the cytoskeleton in muscle cells, focal adhesion, Extracellular Matrix (ECM)-receptor interaction, and the PI3K-Akt signaling pathway (Fig. 3h).In contrast, genes in the MEgreenyellow module, where SLC43A1 is located, were found to be primarily involved in cytokine-cytokine receptor interactions, the Hippo and WNT signaling pathways, and stem cell pluripotency (Fig. 3i).Consistent with previous studies (Wang et al. 2013, Zhao et  ExpOmics for multi-omics data analysis Zhang et al. 2022c, Zhao et al. 2023, Zhang et al. 2024), these results suggest that SERPINE1 and SLC43A1 may function as potential oncogenes that regulate CRC through different biological processes.

Discussion
As demonstrated herein, ExpOmics is an easy-to-use web platform that allows biologists to perform powerful multiomics data analysis.Compared to existing web platforms, ExpOmics has a wider range of applications and stronger analytical capabilities (Table 3).For instance, compared to mainstream multi-omics web platforms, such as ExpressVis (Liu et al. 2022), eVITTA (Cheng et al. 2021), and OmicsAnalyst (Zhou et al. 2021).ExpOmics supports the expression data of genes, mRNAs, along with proteins and analysis of non-coding RNA expression data, including lncRNAs, circRNAs, miRNAs, and piRNAs.Moreover, ExpOmics can accommodate expression data without species restrictions, whereas ExpressVis and eVITTA only support data from model organisms, such as Homo sapiens, Mus musculus, and Rattus norvegicus.Although OmicsAnalyst (Zhou et al. 2021) can display expression data without species restrictions, it lacks the capacity to perform functional enrichment.Both ExpressVis, eVITTA, and OmicsAnalyst also lacks WGCNA functionality, whereas ExpOmics specifically provides this feature.Furthermore, WGCNA reveals intricate relationships between diverse gene features and elucidate how these relationships correlate with phenotypic traits (Langfelder and Horvath 2008).Compared to ExpOmics, their capabilities for differential expression and co-expression analyses are limited, and they have not developed functional modules for the integrated analysis of TCGA data.Altogether, ExpOmics demonstrated superior performance and more comprehensive functionality than similar existing web platforms.

Conclusion
In the foreseeable future, we plan to enhance ProteinExplyzer in ExpOmics by integrating a protein-protein interaction network analysis function.The web platform will be updated to adapt expression data at larger scales, such as single-cell and spatial transcriptomes, and more custom parameters, such as plot size, plot color scheme, user-defined gene matrix transposed file, and NCBI GEO platform files.We are committed to the ongoing maintenance and improvement of ExpOmics regarding analytical and visualization capabilities, and we expect these updates to significantly boost the utility of ExpOmics in the analysis of multi-omics data.

Table 2 .
Details of the implementation and utilization of the 4 toolkits in ExpOmics.
� Input gene feature(s): One or more gene feature(s) � A table for the enriched GO terms � Multiple GSEA plots � A ridge plot (continued)

Table 2 .
(continued) BC ¼ biological processes GO terms; CC ¼ cell components GO terms; MF ¼ molecular functions GO terms; GO ¼ Gene Ontology; KEGG ¼ Kyoto Encyclopedia of Genes and Genomes; LASSO ¼ Least Absolute Shrinkage and Selection Operator; ROC ¼ Receiver Operation Curve; TCGA ¼ The Cancer Genome Atlas; WGCNA ¼ Weighted Gene Co-expression Network Analysis.

Table 3 .
Comparison of application scope and functionality between ExpOmics and other web platforms.Note: "-," the data and function feature is not provided or cannot be statistically calculated; "�," the data and function feature is provided by the database; "×," the data and function feature is not provided by the database; GO ¼ Gene Ontology; KEGG ¼ Kyoto Encyclopedia of Genes and Genomes; LASSO ¼ Least Absolute Shrinkage and Selection Operator; ROC ¼ Receiver Operating Characteristic; WGCNA ¼ Weighted Gene Co-expression Network Analysis; PPI ¼ Protein-Protein Interaction.